Part 2.3: Data Visualization and Impact Analysis¶

Modular Functions for Impact Analysis & Visualization

This section provides reusable, parameterized functions for analyzing and visualizing performance metrics across temporal and categorical dimensions. Designed for flexibility and clarity, the functions support:

  • Dynamic grouping by time (year, month_name, day_name) or category (store, promo, etc.)

  • Preprocessing filters to exclude non-operational records (e.g., closed stores, zero-sales days)

  • Statistical summaries including mean, standard deviation, and count

  • Ranked insights with volatility and performance differentials

  • Interactive visualizations via Plotly for enhanced interpretability

These tools enable scalable impact assessments and trend analyses across diverse datasets with minimal code repetition.

1. Setup & Imports Libraries¶


In [1]:
import time 
In [2]:
# Step 1: Setup & Imports Libraries
print("Step 1: Setup and Import Libraries started...")
time.sleep(1)  # Simulate processing time
Step 1: Setup and Import Libraries started...
In [3]:
# Data Manipulation & Processing
import math
import numpy as np
import pandas as pd
import scipy.stats as stats
from datetime import datetime
from sklearn.preprocessing import *

# Data Visualization
import seaborn as sbn
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

import plotly.io as pio
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

from pandas.plotting import scatter_matrix

# to ensure Plotly works in both Jupyter and HTML export
pio.renderers.default = "notebook+plotly_mimetype"

sbn.set(rc={'figure.figsize':(14,6)})
plt.style.use('seaborn-v0_8')
sbn.set_palette("husl")

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)
pd.set_option('display.float_format','{:.2f}'.format)

# Warnings
import warnings
warnings.simplefilter('ignore')
warnings.filterwarnings('ignore')
In [4]:
print("="*60)
print("Rossman Store Sales Time Series Analysis - Part 2")
print("="*60)
print("All libraries imported successfully!")
print("Analysis Date:", pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S'))
============================================================
Rossman Store Sales Time Series Analysis - Part 2
============================================================
All libraries imported successfully!
Analysis Date: 2025-08-13 21:22:59
In [5]:
print("✅ Setup and Import Liraries completed.\n")
✅ Setup and Import Liraries completed.

In [6]:
# Start Impact Analysis

viz_impact_analysis_begin = pd.Timestamp.now()

bold_start = '\033[1m'
bold_end = '\033[0m'

print("🔍 Viz impact Analysis Started ...")
print(f"🟢 Begin Date: {bold_start}{viz_impact_analysis_begin.strftime('%Y-%m-%d %H:%M:%S')}{bold_end}\n")
🔍 Viz impact Analysis Started ...
🟢 Begin Date: 2025-08-13 21:23:00

Restore the file¶


In [7]:
%store -r df_viz_feat

View or Display Dataset¶

In [8]:
print("\nTrain Data Preview:")
print("\n",df_viz_feat.head())
Train Data Preview:

         store  dayofweek       date  sales  customers  open     promo stateholiday  schoolholiday  day  week month  quarter  year  isweekend  isholiday  isschoolDay
982643   1115          2 2013-01-01      0          0     0  No Promo       Public              1  Tue     1   Jan        1  2013      False       True        False
982640   1112          2 2013-01-01      0          0     0  No Promo       Public              1  Tue     1   Jan        1  2013      False       True        False
982639   1111          2 2013-01-01      0          0     0  No Promo       Public              1  Tue     1   Jan        1  2013      False       True        False
982638   1110          2 2013-01-01      0          0     0  No Promo       Public              1  Tue     1   Jan        1  2013      False       True        False
982637   1109          2 2013-01-01      0          0     0  No Promo       Public              1  Tue     1   Jan        1  2013      False       True        False
In [9]:
# Step 2: Data Ingestion
print("Step 2: Features Engineering started...")
time.sleep(1)  # Simulate processing time
Step 2: Features Engineering started...
In [10]:
# Make a copy of the original dataframe to avoid modifying it
df_features = df_viz_feat.copy()

🧩 Reusable Utility Functions for Analytical Workflows¶


Promotion Impact Analysis¶

In [11]:
def analyze_promotion_impact(df, target_col='sales', category_col='day', width=1200, height=500):
    """
    Analyze promotion impact across any category (day, month, store, etc.)
    
    Parameters:
    - df: DataFrame with 'promo' column
    - target_col: metric to analyze ('sales' or 'customers')
    - category_col: grouping category ('day', 'month', 'store', etc.)
    """
    # Filter out closed stores (e.g., sales = 0) for unbiased and fair comparison
    df = df[df[target_col] > 0].copy()
    
    # Create summary data
    summary = df.groupby([category_col, 'promo'])[target_col].mean().reset_index()
    
    # Visualization
    fig = px.bar(
        summary,
        x=category_col,
        y=target_col,
        color='promo',
        title=f'Promotion Impact: {target_col.title()} by {category_col.title()}',
        color_discrete_map={'Promo': '#636EFA', 'No Promo': '#EF553B'},
        barmode='group'
    )
    fig.update_layout(title_x=0.5, height=height, width=width)
    fig.show(config={'displayModeBar': True, 'displaylogo': False})
    
    # Impact analysis
    print(f"Promotion Impact Analysis - {target_col.title()} by {category_col.title()}:")
    print("=" * 60)
    
    categories = sorted(df[category_col].unique())
    for category in categories:
        no_promo = df[(df[category_col] == category) & (df['promo'] == 'No Promo')][target_col].mean()
        promo = df[(df[category_col] == category) & (df['promo'] == 'Promo')][target_col].mean()
        
        if not pd.isna(no_promo) and not pd.isna(promo):
            lift = ((promo - no_promo) / no_promo) * 100
            currency = "€" if target_col == 'sales' else ""
            print(f"{str(category):12}: No Promo {currency}{no_promo:6,.0f} | Promo {currency}{promo:6,.0f} | Lift {lift:+5.1f}%")
    
    # Overall summary
    overall_no_promo = df[df['promo'] == 'No Promo'][target_col].mean()
    overall_promo = df[df['promo'] == 'Promo'][target_col].mean()
    overall_lift = ((overall_promo - overall_no_promo) / overall_no_promo) * 100
    currency = "€" if target_col == 'sales' else ""
    
    print(f"\nOverall Impact:")
    print(f"Average lift from promotions: {overall_lift:+.1f}%")
    print(f"Additional revenue per day: {currency}{overall_promo - overall_no_promo:,.0f}")
    
    return summary

Top performer Impact Analysis¶

In [12]:
def analyze_top_performers(df, group_col='store', target_col='sales', top_n=10, width=1200, height=500):
    """
    Analyze top performing entities (stores, days, months, etc.)
    
    Parameters:
    - df: DataFrame 
    - group_col: column to group by ('store', 'day', 'month', etc.)
    - target_col: metric to analyze ('sales' or 'customers')  
    - top_n: number of top performers to show
    """
    
    # Filter out closed stores (e.g., sales = 0) for unbiased and fair comparison
    df = df[df[target_col] > 0].copy()
    
    # Calculate averages and get top performers
    group_avg = df.groupby(group_col)[target_col].mean()
    top_performers = group_avg.nlargest(top_n)
    
    # Visualization
    fig = px.bar(
        x=top_performers.index.astype(str),
        y=top_performers.values,
        title=f'Top {top_n} {group_col.title()} by Average {target_col.title()}',
        labels={'x': group_col.title(), 'y': f'Average {target_col.title()}'}
    )
    fig.update_layout(title_x=0.5, height=height, width=width)
    fig.show(config={'displayModeBar': True, 'displaylogo': False})
    
    # Performance analysis
    print(f"Top {top_n} {group_col.title()} Performance Analysis:")
    print("=" * 55)
    print(f"{'Rank':<4} {group_col.title():<10} {'Average':<15} {'% of #1':<10}")
    print("-" * 55)
    
    for i, (entity, avg_value) in enumerate(top_performers.items(), 1):
        pct_of_top = (avg_value / top_performers.iloc[0]) * 100
        currency = "€" if target_col == 'sales' else ""
        print(f"{i:<4} {str(entity):<10} {currency}{avg_value:>9,.0f}     {pct_of_top:>6.1f}%")
    
    # Summary statistics
    total_entities = len(group_avg)
    top_avg = top_performers.mean()
    overall_avg = group_avg.mean()
    performance_gap = ((top_avg - overall_avg) / overall_avg) * 100
    
    print(f"\nSummary Statistics:")
    print(f"Total {group_col}s analyzed: {total_entities:,}")
    print(f"Top {top_n} average: {currency}{top_avg:,.0f}")
    print(f"Overall average: {currency}{overall_avg:,.0f}")
    print(f"Top {top_n} outperform by: {performance_gap:.1f}%")
    
    return top_performers

Temporal Trends Impact Analysis¶

In [13]:
def analyze_temporal_trends(df, time_col='month', target_col='sales', width=1200, height=500):
    """
    Analyze trends over time periods (month_name, day_name, year, etc.)
    
    Parameters:
    - df: DataFrame
    - time_col: time dimension ('month_name', 'day_name', 'year', etc.)
    - target_col: metric to analyze ('sales' or 'customers')
    """
    # Filter out closed stores (e.g., sales = 0) for unbiased and fair comparison
    df = df[df[target_col] > 0].copy()

   # Calculate averages by time period
    time_stats = df.groupby(time_col).agg({
        target_col: ['mean', 'std', 'count']
    }).round(0)
    
    time_stats.columns = ['avg', 'std', 'count']
    time_stats = time_stats.reset_index().sort_values('avg', ascending=False)
    
    # Visualization
    fig = px.bar(
        time_stats,
        x=time_col,
        y='avg',
        title=f'{target_col.title()} Performance by {time_col.title()}',
        color='avg',
        color_continuous_scale='viridis'
    )
    fig.update_layout(title_x=0.5, height=height, width=width)
    fig.show(config={'displayModeBar': True, 'displaylogo': False})
    
    # Trend analysis
    print(f"{time_col.title()} Performance Analysis:")
    print("=" * 50)
    print(f"{'Rank':<4} {time_col.title():<12} {'Average':<15} {'Std Dev':<10} {'Count':<8}")
    print("-" * 60)
    
    for i, row in time_stats.iterrows():
        currency = "€" if target_col == 'sales' else ""
        print(f"{i+1:<4} {str(row[time_col]):<12} {currency}{row['avg']:>9,.0f}     {currency}{row['std']:>6,.0f}   {row['count']:>6,.0f}")
    
    # Key insights
    best_period = time_stats.iloc[0][time_col]
    worst_period = time_stats.iloc[-1][time_col]
    best_value = time_stats.iloc[0]['avg']
    worst_value = time_stats.iloc[-1]['avg']
    volatility = ((best_value - worst_value) / time_stats['avg'].mean()) * 100
    
    currency = "€" if target_col == 'sales' else ""
    print(f"\nKey Insights:")
    print(f"Best {time_col}: {best_period} ({currency}{best_value:,.0f})")
    print(f"Worst {time_col}: {worst_period} ({currency}{worst_value:,.0f})")
    print(f"Performance range: {currency}{best_value - worst_value:,.0f}")
    print(f"Volatility: {volatility:.1f}%")
    
    return time_stats

State Holiday Impact Analysis¶

In [14]:
def analyze_stateholiday_impact(df, target_col ='sales', category_col ='day', width =1200, height =500):
    """
    Analyze the impact of state holidays across a specified category (e.g., day, month, store).
    
    Parameters:
    - df: DataFrame with 'stateholiday' column already mapped to labels
    - target_col: metric to analyze ('sales' or 'customers')
    - category_col: grouping category ('day', 'month', 'store', etc.)
    """
    # Filter out closed stores (e.g., sales = 0) for unbiased and fair comparison
    df_open = df[df[target_col] > 0].copy()
    
    # Create summary data
    summary = df_open.groupby([category_col, 'stateholiday'])[target_col].mean().reset_index()
    
    # Visualization
    fig = px.bar(
        summary,
        x =category_col,
        y =target_col,
        color ='stateholiday',
        title =f'State Holiday Impact: {target_col.title()} by {category_col.title()}',
        barmode ='group'
    )
    fig.update_layout(title_x =0.5, height =height, width =width)
    fig.show(config={'displayModeBar': True, 'displaylogo': False})
    
    # Impact analysis
    print(f"State Holiday Impact Analysis - {target_col.title()} by {category_col.title()}:")
    print("=" * 70)
    
    # Get regular day baseline
    regular_avg = df_open[df_open['stateholiday'] == 'Normal Day'][target_col].mean()
    
    # Overall holiday impact
    print("Overall Holiday Impact:")
    print("-" * 25)
    for holiday_label in ['Public', 'Easter', 'Christmas']:
        if holiday_label in df['stateholiday'].values:
            holiday_avg = df_open[df_open['stateholiday'] == holiday_label][target_col].mean()
            if not pd.isna(holiday_avg):
                impact = ((holiday_avg - regular_avg) / regular_avg) * 100
                currency = "€" if target_col == 'sales' else ""
                print(f"{holiday_label:15}: {currency}{holiday_avg:6,.0f} ({impact:+5.1f}% vs regular)")
    
    print(f"Regular Days: €{regular_avg:6,.0f} (baseline)")
    
    # Category-wise analysis
    print(f"\nHoliday Impact by {category_col.title()}:")
    print("-" * 35)
    
    categories = sorted(df[category_col].unique())
    for category in categories:
        regular = df_open[(df_open[category_col] == category) & (df_open['stateholiday'] == 'Normal Day')][target_col].mean()
        holiday = df_open[(df_open[category_col] == category) & (df_open['stateholiday'] != 'Normal Day')][target_col].mean()
        
        if not pd.isna(regular):
            currency = "€" if target_col == 'sales' else ""
            if not pd.isna(holiday):
                impact = ((holiday - regular) / regular) * 100
                print(f"{str(category):12}: Regular {currency}{regular:6,.0f} | Holiday {currency}{holiday:6,.0f} | Impact {impact:+5.1f}%")
            else:
                print(f"{str(category):12}: Regular {currency}{regular:6,.0f} | No holiday data")
    
    # Store closure analysis
    total_records = len(df)
    closed_stores = len(df[df[target_col] == 0])
    holiday_closures = len(df[(df['stateholiday'] != 'Normal Day') & (df[target_col] == 0)])
    regular_closures = len(df[(df['stateholiday'] == 'Normal Day') & (df[target_col] == 0)])
    
    print(f"\nStore Operations Impact:")
    print("-" * 25)
    print(f"Total store closures: {closed_stores:,} ({(closed_stores/total_records)*100:.1f}%)")
    print(f"Holiday closures: {holiday_closures:,}")
    print(f"Regular closures: {regular_closures:,}")
    
    return summary

RUNNING TOP 4 IMPORTANT ANALYSIS¶

In [15]:
# Example usage:
if __name__ == "__main__":
    # Assuming df_features is your dataset
    
    print("=== RUNNING TOP 4 MOST IMPORTANT ANALYSES ===\n")
    
    # 1. Promotion Impact Analysis
    print("1. PROMOTION IMPACT ANALYSIS")
    print("-" * 40)
    promo_results = analyze_promotion_impact(df_features, target_col='sales', category_col='day')
    
    print("\n" + "="*80 + "\n")
    
    # 2. State Holiday Impact Analysis
    print("2. STATE HOLIDAY IMPACT ANALYSIS")
    print("-" * 40)
    holiday_results = analyze_stateholiday_impact(df_features, target_col='sales', category_col='day')
    
    print("\n" + "="*80 + "\n")
    
    # 3. Top Store Performance Analysis  
    print("3. TOP STORE PERFORMANCE ANALYSIS")
    print("-" * 40)
    store_results = analyze_top_performers(df_features, group_col='store', target_col='sales', top_n=10)
    
    print("\n" + "="*80 + "\n")
    
    # 4. Temporal Trends Analysis
    print("4. TEMPORAL TRENDS ANALYSIS")
    print("-" * 40)
    trend_results = analyze_temporal_trends(df_features, time_col='month', target_col='sales')
=== RUNNING TOP 4 MOST IMPORTANT ANALYSES ===

1. PROMOTION IMPACT ANALYSIS
----------------------------------------
Promotion Impact Analysis - Sales by Day:
============================================================
Fri         : No Promo € 6,352 | Promo € 7,734 | Lift +21.8%
Mon         : No Promo € 6,205 | Promo € 9,776 | Lift +57.5%
Thu         : No Promo € 5,737 | Promo € 7,595 | Lift +32.4%
Tue         : No Promo € 5,716 | Promo € 8,277 | Lift +44.8%
Wed         : No Promo € 5,610 | Promo € 7,678 | Lift +36.9%

Overall Impact:
Average lift from promotions: +38.8%
Additional revenue per day: €2,302

================================================================================

2. STATE HOLIDAY IMPACT ANALYSIS
----------------------------------------
State Holiday Impact Analysis - Sales by Day:
======================================================================
Overall Holiday Impact:
-------------------------
Public         : € 8,487 (+22.1% vs regular)
Easter         : € 9,888 (+42.3% vs regular)
Christmas      : € 9,744 (+40.2% vs regular)
Regular Days: € 6,951 (baseline)

Holiday Impact by Day:
-----------------------------------
Fri         : Regular € 7,062 | Holiday €11,190 | Impact +58.5%
Mon         : Regular € 8,217 | Holiday € 8,191 | Impact  -0.3%
Sat         : Regular € 5,885 | Holiday €15,715 | Impact +167.1%
Sun         : Regular € 8,206 | Holiday € 4,977 | Impact -39.4%
Thu         : Regular € 6,750 | Holiday € 8,438 | Impact +25.0%
Tue         : Regular € 7,091 | Holiday € 6,053 | Impact -14.6%
Wed         : Regular € 6,715 | Holiday € 8,711 | Impact +29.7%

Store Operations Impact:
-------------------------
Total store closures: 168,494 (17.1%)
Holiday closures: 30,140
Regular closures: 138,354

================================================================================

3. TOP STORE PERFORMANCE ANALYSIS
----------------------------------------
Top 10 Store Performance Analysis:
=======================================================
Rank Store      Average         % of #1   
-------------------------------------------------------
1    817        €   21,789      100.0%
2    262        €   20,684       94.9%
3    1114       €   20,576       94.4%
4    251        €   19,158       87.9%
5    842        €   18,472       84.8%
6    513        €   18,212       83.6%
7    562        €   17,985       82.5%
8    788        €   17,968       82.5%
9    383        €   17,293       79.4%
10   756        €   16,575       76.1%

Summary Statistics:
Total stores analyzed: 1,115
Top 10 average: €18,871
Overall average: €6,931
Top 10 outperform by: 172.3%

================================================================================

4. TEMPORAL TRENDS ANALYSIS
----------------------------------------
Month Performance Analysis:
==================================================
Rank Month        Average         Std Dev    Count   
------------------------------------------------------------
3    Dec          €    8,609     € 4,094   50,393
10   Nov          €    7,189     € 3,010   51,401
9    May          €    7,107     € 3,008   80,099
1    Apr          €    7,047     € 3,138   81,726
7    Jun          €    7,001     € 3,145   82,571
8    Mar          €    6,977     € 3,124   85,975
6    Jul          €    6,910     € 3,038   55,388
2    Aug          €    6,649     € 2,902   54,411
11   Oct          €    6,603     € 2,815   53,291
4    Feb          €    6,589     € 2,871   80,239
5    Jan          €    6,564     € 2,849   86,335
12   Sep          €    6,547     € 2,889   52,321

Key Insights:
Best month: Dec (€8,609)
Worst month: Sep (€6,547)
Performance range: €2,062
Volatility: 29.5%

Top 5 performing days¶

In [16]:
# Top 5 performing days
analyze_top_performers(df_features, 'day', 'sales', 5)
Top 5 Day Performance Analysis:
=======================================================
Rank Day        Average         % of #1   
-------------------------------------------------------
1    Mon        €    8,217      100.0%
2    Sun        €    8,205       99.8%
3    Tue        €    7,091       86.3%
4    Fri        €    7,066       86.0%
5    Thu        €    6,756       82.2%

Summary Statistics:
Total days analyzed: 7
Top 5 average: €7,467
Overall average: €7,134
Top 5 outperform by: 4.7%
Out[16]:
day
Mon   8217.44
Sun   8204.63
Tue   7090.99
Fri   7066.37
Thu   6756.03
Name: sales, dtype: float64

Yearly sales trends¶

In [17]:
# Yearly sales trends
analyze_temporal_trends(df_features, 'year', 'sales')
Year Performance Analysis:
==================================================
Rank Year         Average         Std Dev    Count   
------------------------------------------------------------
3    2015.0       €    7,098     € 3,051   165,841
2    2014.0       €    7,026     € 3,129   310,385
1    2013.0       €    6,815     € 3,115   337,924

Key Insights:
Best year: 2015.0 (€7,098)
Worst year: 2013.0 (€6,815)
Performance range: €283
Volatility: 4.1%
Out[17]:
year avg std count
2 2015 7098.00 3051.00 165841
1 2014 7026.00 3129.00 310385
0 2013 6815.00 3115.00 337924

Promotion impact on customers by month¶

In [18]:
# Promotion impact on customers by month
analyze_promotion_impact(df_features, 'customers', 'month')
Promotion Impact Analysis - Customers by Month:
============================================================
Apr         : No Promo    698 | Promo    863 | Lift +23.7%
Aug         : No Promo    684 | Promo    838 | Lift +22.6%
Dec         : No Promo    810 | Promo    998 | Lift +23.2%
Feb         : No Promo    679 | Promo    805 | Lift +18.5%
Jan         : No Promo    661 | Promo    799 | Lift +20.9%
Jul         : No Promo    675 | Promo    855 | Lift +26.7%
Jun         : No Promo    685 | Promo    858 | Lift +25.3%
Mar         : No Promo    681 | Promo    845 | Lift +24.1%
May         : No Promo    728 | Promo    842 | Lift +15.6%
Nov         : No Promo    723 | Promo    845 | Lift +17.0%
Oct         : No Promo    705 | Promo    819 | Lift +16.2%
Sep         : No Promo    679 | Promo    835 | Lift +22.9%

Overall Impact:
Average lift from promotions: +21.3%
Additional revenue per day: 148
Out[18]:
month promo customers
0 Apr No Promo 697.81
1 Apr Promo 862.93
2 Aug No Promo 683.70
3 Aug Promo 838.39
4 Dec No Promo 810.09
5 Dec Promo 998.24
6 Feb No Promo 679.25
7 Feb Promo 805.11
8 Jan No Promo 661.39
9 Jan Promo 799.34
10 Jul No Promo 675.14
11 Jul Promo 855.10
12 Jun No Promo 685.07
13 Jun Promo 858.47
14 Mar No Promo 680.72
15 Mar Promo 844.62
16 May No Promo 728.14
17 May Promo 841.68
18 Nov No Promo 722.73
19 Nov Promo 845.48
20 Oct No Promo 704.59
21 Oct Promo 818.75
22 Sep No Promo 679.46
23 Sep Promo 834.98

# Holiday impact on customers by month¶

In [19]:
# Holiday impact on customers by month
analyze_stateholiday_impact(df_features, 'customers', 'month')
State Holiday Impact Analysis - Customers by Month:
======================================================================
Overall Holiday Impact:
-------------------------
Public         :  1,279 (+67.6% vs regular)
Easter         :  1,687 (+121.1% vs regular)
Christmas      :  1,569 (+105.7% vs regular)
Regular Days: €   763 (baseline)

Holiday Impact by Month:
-----------------------------------
Apr         : Regular    773 | Holiday  1,619 | Impact +109.4%
Aug         : Regular    750 | Holiday    755 | Impact  +0.6%
Dec         : Regular    886 | Holiday  1,569 | Impact +77.2%
Feb         : Regular    732 | No holiday data
Jan         : Regular    722 | Holiday  1,249 | Impact +73.0%
Jul         : Regular    761 | No holiday data
Jun         : Regular    758 | Holiday  1,096 | Impact +44.5%
Mar         : Regular    759 | Holiday  2,236 | Impact +194.6%
May         : Regular    777 | Holiday  1,418 | Impact +82.6%
Nov         : Regular    782 | Holiday  2,578 | Impact +229.5%
Oct         : Regular    752 | Holiday  1,212 | Impact +61.3%
Sep         : Regular    746 | No holiday data

Store Operations Impact:
-------------------------
Total store closures: 168,492 (17.1%)
Holiday closures: 30,140
Regular closures: 138,352
Out[19]:
month stateholiday customers
0 Apr Easter 1618.79
1 Apr Normal Day 772.97
2 Aug Normal Day 750.41
3 Aug Public 754.88
4 Dec Christmas 1569.23
5 Dec Normal Day 885.67
6 Feb Normal Day 731.54
7 Jan Normal Day 722.21
8 Jan Public 1249.49
9 Jul Normal Day 761.38
10 Jun Normal Day 758.48
11 Jun Public 1095.97
12 Mar Easter 2235.94
13 Mar Normal Day 758.85
14 May Normal Day 776.91
15 May Public 1418.38
16 Nov Normal Day 782.22
17 Nov Public 2577.62
18 Oct Normal Day 751.84
19 Oct Public 1212.47
20 Sep Normal Day 745.74
In [20]:
print("✅ Data Visualization Impact Analysis completed.\n")
✅ Data Visualization Impact Analysis completed.

In [21]:
print("✅ Features Engineering and Data Visualization (I) completed successfully!")
print(f"🗓️ Analysis Date: {bold_start}{pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')}{bold_end}")
✅ Features Engineering and Data Visualization (I) completed successfully!
🗓️ Analysis Date: 2025-08-13 21:23:12

🌟 Advantages¶

  • Reusable across 'year', 'month', 'dayofweek', etc.

  • Easy to change aggregation type ('sum', 'median', etc.)

  • Consistent naming and sorting

  • Makes your code far more modular for dashboards or reporting

Why Reusability Matters¶


  • 💡 Scalability: You can plug your functions into larger pipelines or production environments without rewrites.
  • 🛠️ Maintainability: A bug fix in one utility can instantly improve multiple workflows.
  • 🚀 Efficiency: Spend less time rewriting logic and more time interpreting results.

Why This Matters for Rossmann Store Sales¶

  • We’ll likely repeat the same aggregations or visualizations across hundreds of stores.
  • Promos, holidays, and weekday patterns demand consistent filtering and analysis.
  • Modular functions help you prototype insights fast, scale across stores, and iterate smoothly.
In [22]:
# End analysis
viz_impact_analysis_end = pd.Timestamp.now()
duration = viz_impact_analysis_end - viz_impact_analysis_begin

# Final summary print
print("\n📋 Features Engineering && Data Viz Summary")
print(f"🟢 Begin Date: {bold_start}{viz_impact_analysis_begin.strftime('%Y-%m-%d %H:%M:%S')}{bold_end}")
print(f"✅ End Date:   {bold_start}{viz_impact_analysis_end.strftime('%Y-%m-%d %H:%M:%S')}{bold_end}")
print(f"⏱️ Duration:   {bold_start}{str(duration)}{bold_end}")
📋 Features Engineering && Data Viz Summary
🟢 Begin Date: 2025-08-13 21:23:00
✅ End Date:   2025-08-13 21:23:12
⏱️ Duration:   0 days 00:00:12.616810

Project Design Rationale: Notebook Separation¶

To promote clarity, maintainability, and scalability within the project, data engineering and visualization tasks are intentionally separated into distinct notebooks. This modular approach prevents the accumulation of excessive code in a single notebook, making it easier to debug, update, and collaborate across different stages of the workflow. By isolating data transformation logic from visual analysis, each notebook remains focused and purpose-driven, ultimately enhancing the overall efficiency and readability of the project.